Filter bubbles, echo chambers, and online news consumption
Uploaded by: Alt-Tab
Upload date: 2019-04-26 15:55:25

Comments:

Very good article on segregation phenomena as measured on online news consumption.

Introduction

  • question asked: impact of technological changes on ideological segregation
  • two conflicting hypotheses: either consumption increased of like-minded opinions (echo chambers) - ex: Sunstein, 2009, or access to broader spectrum of information implies more consumption of opposite opinions - ex: Benkler, 2006

  • work proposed: study 50,000 anon users from the US who regularly consume online news

  • ML algorithms identify hard news, then divide them in descriptive reporting vs opinion pieces
  • defines ideological segregation as the difference of the expected share of conservative news consumption between two random individuals
  • observes that segregation tends to be higher when users come from social media
  • observes that individual users tend to read news only from one side of the spectrum
  • observes counter-intuitively that reading of opposite sides tends to append more often from channels with highest segregation (social, search)
  • descriptive reporting corresponds to about 75% of the traffic
  • online news consumption still dominated by mainstream medias

Data and methods

  • data collection: from Bing toolbar for IE => 1.2M US citizens from March to May 2013
  • focus son 50,000 regular newsreaders => 2.3 billion pages (median:~1000 pages per user)
  • selection bias: individuals who accept to share their info ; IE users in general more aged
  • test representativeness by measuring Spearman coefficient of consumption on the dataset with Quantcast and Alexa rankings: 0.67 et 0.7 ; while Spearman(Quantcast, Alexa) ~ 0.64
identifying news and opinion articles :
  • use Open Directory Project => identify ~8000 domains as news, politics, etc.
  • contain major national sources, important regional outlets, important blogs
  • isolate 4.1M articles, but not always relevant in terms of ideology (e.g. sports, weather,...) => isolate with ML 1.9M of "front section news", among which 200,000 opinion stories (Tab1 indicates terms highly predictive of the categories)
measuring the political slant of publishers
  • impossible to do it manually, but no easy way to do it automatically for all 1.9M articles => assign slant of the outlet
  • use outlet readers slant, inferred from vote at the presidential election, which is inferred from the location through the IP address
  • robustness check: Tab2 lists top20 consistent to common knowledge and consistent with previous studies (Gentzkow et Shapiro, 2011)
inferring consumption channels
  • 4 info channels: direct (visit the domain), social (TB, Twitter, mail), search (Google, Bing, Yahoo), aggregator (Google news)
  • use the referrer domain to define the channel (interpretations pb to solve, eg if ref=Facebook and 4 articles read, are all of them from social origin?)
limiting to active news consumers :
  • limit = 10 news articles and 2 opinion pieces during the 3 months period => from 1.2M to 50.000 users (so 4%)
  • RK: some conclusions are still true with looser threshold

Results

overall segregation

  • individual polarity = average from polarities of the outlets consumed
  • segregation = distance between polarity scores
  • naive estimation insufficient => use of a hierarchical bayesian model
bayesian model
  • process standard in the literature? see Gelman et Hill, 2007
  • look for sigma_d global dispersion
  • polarity of user i supposed to be distributed according to a normal law with latent variables
  • evaluate parameters using approximate marginal likelihood estimate
segregation
  • distribution of users polarity obtained: see fig2
  • segregation = sqrt(2).sigma_p = 0.11
  • 2/3 of the scores are between 0.41 and 0.54 => most people are moderate

segregation by channel and article subjectivity

  • pb of data scarcity exacerbated by dividing data into channels
  • but for a user polarity probably correlated for different channels
  • same type of bayesian model but with a 8 dimension vector
  • 8 dimensions = 4 channels * 2 classes (opinion and report)
results on fig3 : segregation per channel
  • trend: segregation effect stronger for opinions
  • trend: social media tend to increase segregation effects
  • strongest segregation for search ; possible explanations: 1) search formulations are already oriented, 2) when search formulated, users read like-minded medias
  • as access to a large variety of media comes from the technology, they cause the segregation effect
  • trend: aggregators => less segregation
  • interpretation of overall segregation effect weakness: even after pre-filtering, many news are not polarizing

  • general conclusion: there is a filter bubble effect but still limited

ideological isolation

two conflicting hypotheses
  • moderate polarization and individuals consume a large spectrum of opinions
  • moderate polarization but individuals consume a thin spectrum of opinions
  • dispersion sigma_d=0.06, very small => rather second hypothesis
  • explanation: 78% of users use only 1 source, 94% one or two sources
  • RK: still true for users with larger number of sources
Dispersion per user and per channel: Fig4a
  • more or less identical for news and opinions
  • direct: lowest dispersion, search: highest dispersion
Dispersion per individual polarity: Fig4b
  • most polarized individuals are also ones with highest dispersion
Does it mean that highly polarized individuals see opposite opinions? (Fig5)
  • test by ranking medias with l from left (0) to right (1)
  • define opposing partisan exposure o_i = min (l_i , 1-l_i)
  • fig5: percentage of exposure to opposite opinion articles, depending on the channel and on the user polarity
  • lower than 20% in all cases
  • weaker for opinion pieces than reports
  • lowest for most partisan users
  • conclusion: users read ideologically homogeneous outlets, and partisan users are in general exposed only to their side of the spectrum

Discussion and conclusion

Overall
  • with social media and web search in general more segregation than with direct consumption
  • however, channels with more segregation are counter-intuitively related to a wider range of opinions
  • majority of online behaviors mimick reading habits: most users go to their favorite outlet (which are predominantly mainstream)
Limits
  • measure slant of the outlet, not of an article
  • focus only on consumption, not on the vote itself
  • no measure of amplifying effect of social medias or search engines
Alt-Tab at 2019-04-26 16:21:25
Edited by Alt-Tab at 2019-04-26 17:44:19

I agree that it is a very good article. I wish the Algodiv project had the same data…

The main contributions are the results on echo chambers: people are in echo chambers, but the influence remains limited because all users massively consume news content from mainstream media.

I find the "segregation" metric interesting, it should be compared (and merged?) with other diversity-related metrics.

I see a few more limits with the methodology. The main one is the reliability of the slant of a news outlet, obtained from the location of the IP addresses of the readers (not exactly reliable) matched with polls results for a given county at the 2016 presidential election(!). Besides the reliability of the metric, it is very hard to have any notion of this made-up scale. Is a 0.11 interval large or small? What does it mean to have BBC at 0.3 and FoxNews at 0.59? Is a difference between 0.3 and 0.32 truly the same as between 0.48 and 0.5?

Please consider to register or login to comment on the paper.